{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "35c892b0-6bf1-4929-aa17-094344f97c0f",
   "metadata": {},
   "source": [
    "# KNN分类模型\n",
    "- 分类:将一个未知归类的样本归属到某一个已知的类群中\n",
    "- 预测:可以根据数据的规律计算出一个未知的数据"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c8e1cfa-dd57-42ee-bf2d-14e421654756",
   "metadata": {},
   "source": [
    "- 概念:\n",
    "  - 简单来说,K-近邻算法采用测量不同特征值之间的距离方法进行分类(k-Nearest Neighbor,KNN)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "327dcd30-324d-4f3e-995a-1699d1385086",
   "metadata": {},
   "source": [
    "- 欧几里得距离\n",
    "  - 欧氏距离是最常见的距离度量,衡量的是多维空间中各个点之间的绝对距离.\n",
    "  - 公式:dist(X,Y)=((xi-yi)**2)**0.5\n",
    "  - 求坐标两点距离:((x1-x2)**2+(y1-y2)**2)**0.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b49162a3-54dc-4a33-a5c2-33431973f4ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.neighbors import KNeighborsClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "b4335a01-6051-4171-b10c-4bc789e066fa",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\21232\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\openpyxl\\worksheet\\_reader.py:329: UserWarning: Unknown extension is not supported and will be removed\n",
      "  warn(msg)\n"
     ]
    }
   ],
   "source": [
    "df = pd.read_excel('./data/my_films.xlsx')\n",
    "# 提取特征数据\n",
    "feature = df[['Action Lens','Love Lens']]\n",
    "# 提取标签数据\n",
    "target = df['target']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "af6d8b17-5362-441a-b432-ac861db2c8fa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((12, 2), (12,))"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "feature.shape,target.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "41093f70-b717-4c5c-89c6-1c161f25b86d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 数据集切分\n",
    "x_train,x_test,y_train,y_test = train_test_split(feature,target,test_size=0.1,random_state=2020)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a611ee6b-2f72-497e-a070-95bd0083e695",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>#sk-container-id-1 {\n",
       "  /* Definition of color scheme common for light and dark mode */\n",
       "  --sklearn-color-text: black;\n",
       "  --sklearn-color-line: gray;\n",
       "  /* Definition of color scheme for unfitted estimators */\n",
       "  --sklearn-color-unfitted-level-0: #fff5e6;\n",
       "  --sklearn-color-unfitted-level-1: #f6e4d2;\n",
       "  --sklearn-color-unfitted-level-2: #ffe0b3;\n",
       "  --sklearn-color-unfitted-level-3: chocolate;\n",
       "  /* Definition of color scheme for fitted estimators */\n",
       "  --sklearn-color-fitted-level-0: #f0f8ff;\n",
       "  --sklearn-color-fitted-level-1: #d4ebff;\n",
       "  --sklearn-color-fitted-level-2: #b3dbfd;\n",
       "  --sklearn-color-fitted-level-3: cornflowerblue;\n",
       "\n",
       "  /* Specific color for light theme */\n",
       "  --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
       "  --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
       "  --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
       "  --sklearn-color-icon: #696969;\n",
       "\n",
       "  @media (prefers-color-scheme: dark) {\n",
       "    /* Redefinition of color scheme for dark theme */\n",
       "    --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
       "    --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
       "    --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
       "    --sklearn-color-icon: #878787;\n",
       "  }\n",
       "}\n",
       "\n",
       "#sk-container-id-1 {\n",
       "  color: var(--sklearn-color-text);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 pre {\n",
       "  padding: 0;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 input.sk-hidden--visually {\n",
       "  border: 0;\n",
       "  clip: rect(1px 1px 1px 1px);\n",
       "  clip: rect(1px, 1px, 1px, 1px);\n",
       "  height: 1px;\n",
       "  margin: -1px;\n",
       "  overflow: hidden;\n",
       "  padding: 0;\n",
       "  position: absolute;\n",
       "  width: 1px;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-dashed-wrapped {\n",
       "  border: 1px dashed var(--sklearn-color-line);\n",
       "  margin: 0 0.4em 0.5em 0.4em;\n",
       "  box-sizing: border-box;\n",
       "  padding-bottom: 0.4em;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-container {\n",
       "  /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
       "     but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
       "     so we also need the `!important` here to be able to override the\n",
       "     default hidden behavior on the sphinx rendered scikit-learn.org.\n",
       "     See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
       "  display: inline-block !important;\n",
       "  position: relative;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-text-repr-fallback {\n",
       "  display: none;\n",
       "}\n",
       "\n",
       "div.sk-parallel-item,\n",
       "div.sk-serial,\n",
       "div.sk-item {\n",
       "  /* draw centered vertical line to link estimators */\n",
       "  background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
       "  background-size: 2px 100%;\n",
       "  background-repeat: no-repeat;\n",
       "  background-position: center center;\n",
       "}\n",
       "\n",
       "/* Parallel-specific style estimator block */\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item::after {\n",
       "  content: \"\";\n",
       "  width: 100%;\n",
       "  border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
       "  flex-grow: 1;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel {\n",
       "  display: flex;\n",
       "  align-items: stretch;\n",
       "  justify-content: center;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  position: relative;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item:first-child::after {\n",
       "  align-self: flex-end;\n",
       "  width: 50%;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item:last-child::after {\n",
       "  align-self: flex-start;\n",
       "  width: 50%;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item:only-child::after {\n",
       "  width: 0;\n",
       "}\n",
       "\n",
       "/* Serial-specific style estimator block */\n",
       "\n",
       "#sk-container-id-1 div.sk-serial {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "  align-items: center;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  padding-right: 1em;\n",
       "  padding-left: 1em;\n",
       "}\n",
       "\n",
       "\n",
       "/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
       "clickable and can be expanded/collapsed.\n",
       "- Pipeline and ColumnTransformer use this feature and define the default style\n",
       "- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
       "*/\n",
       "\n",
       "/* Pipeline and ColumnTransformer style (default) */\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable {\n",
       "  /* Default theme specific background. It is overwritten whether we have a\n",
       "  specific estimator or a Pipeline/ColumnTransformer */\n",
       "  background-color: var(--sklearn-color-background);\n",
       "}\n",
       "\n",
       "/* Toggleable label */\n",
       "#sk-container-id-1 label.sk-toggleable__label {\n",
       "  cursor: pointer;\n",
       "  display: block;\n",
       "  width: 100%;\n",
       "  margin-bottom: 0;\n",
       "  padding: 0.5em;\n",
       "  box-sizing: border-box;\n",
       "  text-align: center;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 label.sk-toggleable__label-arrow:before {\n",
       "  /* Arrow on the left of the label */\n",
       "  content: \"▸\";\n",
       "  float: left;\n",
       "  margin-right: 0.25em;\n",
       "  color: var(--sklearn-color-icon);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {\n",
       "  color: var(--sklearn-color-text);\n",
       "}\n",
       "\n",
       "/* Toggleable content - dropdown */\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content {\n",
       "  max-height: 0;\n",
       "  max-width: 0;\n",
       "  overflow: hidden;\n",
       "  text-align: left;\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content.fitted {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content pre {\n",
       "  margin: 0.2em;\n",
       "  border-radius: 0.25em;\n",
       "  color: var(--sklearn-color-text);\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content.fitted pre {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
       "  /* Expand drop-down */\n",
       "  max-height: 200px;\n",
       "  max-width: 100%;\n",
       "  overflow: auto;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
       "  content: \"▾\";\n",
       "}\n",
       "\n",
       "/* Pipeline/ColumnTransformer-specific style */\n",
       "\n",
       "#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  color: var(--sklearn-color-text);\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "/* Estimator-specific style */\n",
       "\n",
       "/* Colorize estimator box */\n",
       "#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-label label.sk-toggleable__label,\n",
       "#sk-container-id-1 div.sk-label label {\n",
       "  /* The background is the default theme color */\n",
       "  color: var(--sklearn-color-text-on-default-background);\n",
       "}\n",
       "\n",
       "/* On hover, darken the color of the background */\n",
       "#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {\n",
       "  color: var(--sklearn-color-text);\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "/* Label box, darken color on hover, fitted */\n",
       "#sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
       "  color: var(--sklearn-color-text);\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "/* Estimator label */\n",
       "\n",
       "#sk-container-id-1 div.sk-label label {\n",
       "  font-family: monospace;\n",
       "  font-weight: bold;\n",
       "  display: inline-block;\n",
       "  line-height: 1.2em;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-label-container {\n",
       "  text-align: center;\n",
       "}\n",
       "\n",
       "/* Estimator-specific */\n",
       "#sk-container-id-1 div.sk-estimator {\n",
       "  font-family: monospace;\n",
       "  border: 1px dotted var(--sklearn-color-border-box);\n",
       "  border-radius: 0.25em;\n",
       "  box-sizing: border-box;\n",
       "  margin-bottom: 0.5em;\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-estimator.fitted {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-0);\n",
       "}\n",
       "\n",
       "/* on hover */\n",
       "#sk-container-id-1 div.sk-estimator:hover {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-estimator.fitted:hover {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
       "\n",
       "/* Common style for \"i\" and \"?\" */\n",
       "\n",
       ".sk-estimator-doc-link,\n",
       "a:link.sk-estimator-doc-link,\n",
       "a:visited.sk-estimator-doc-link {\n",
       "  float: right;\n",
       "  font-size: smaller;\n",
       "  line-height: 1em;\n",
       "  font-family: monospace;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  border-radius: 1em;\n",
       "  height: 1em;\n",
       "  width: 1em;\n",
       "  text-decoration: none !important;\n",
       "  margin-left: 1ex;\n",
       "  /* unfitted */\n",
       "  border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
       "  color: var(--sklearn-color-unfitted-level-1);\n",
       "}\n",
       "\n",
       ".sk-estimator-doc-link.fitted,\n",
       "a:link.sk-estimator-doc-link.fitted,\n",
       "a:visited.sk-estimator-doc-link.fitted {\n",
       "  /* fitted */\n",
       "  border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
       "  color: var(--sklearn-color-fitted-level-1);\n",
       "}\n",
       "\n",
       "/* On hover */\n",
       "div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
       ".sk-estimator-doc-link:hover,\n",
       "div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
       ".sk-estimator-doc-link:hover {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-3);\n",
       "  color: var(--sklearn-color-background);\n",
       "  text-decoration: none;\n",
       "}\n",
       "\n",
       "div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
       ".sk-estimator-doc-link.fitted:hover,\n",
       "div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
       ".sk-estimator-doc-link.fitted:hover {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-3);\n",
       "  color: var(--sklearn-color-background);\n",
       "  text-decoration: none;\n",
       "}\n",
       "\n",
       "/* Span, style for the box shown on hovering the info icon */\n",
       ".sk-estimator-doc-link span {\n",
       "  display: none;\n",
       "  z-index: 9999;\n",
       "  position: relative;\n",
       "  font-weight: normal;\n",
       "  right: .2ex;\n",
       "  padding: .5ex;\n",
       "  margin: .5ex;\n",
       "  width: min-content;\n",
       "  min-width: 20ex;\n",
       "  max-width: 50ex;\n",
       "  color: var(--sklearn-color-text);\n",
       "  box-shadow: 2pt 2pt 4pt #999;\n",
       "  /* unfitted */\n",
       "  background: var(--sklearn-color-unfitted-level-0);\n",
       "  border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
       "}\n",
       "\n",
       ".sk-estimator-doc-link.fitted span {\n",
       "  /* fitted */\n",
       "  background: var(--sklearn-color-fitted-level-0);\n",
       "  border: var(--sklearn-color-fitted-level-3);\n",
       "}\n",
       "\n",
       ".sk-estimator-doc-link:hover span {\n",
       "  display: block;\n",
       "}\n",
       "\n",
       "/* \"?\"-specific style due to the `<a>` HTML tag */\n",
       "\n",
       "#sk-container-id-1 a.estimator_doc_link {\n",
       "  float: right;\n",
       "  font-size: 1rem;\n",
       "  line-height: 1em;\n",
       "  font-family: monospace;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  border-radius: 1rem;\n",
       "  height: 1rem;\n",
       "  width: 1rem;\n",
       "  text-decoration: none;\n",
       "  /* unfitted */\n",
       "  color: var(--sklearn-color-unfitted-level-1);\n",
       "  border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 a.estimator_doc_link.fitted {\n",
       "  /* fitted */\n",
       "  border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
       "  color: var(--sklearn-color-fitted-level-1);\n",
       "}\n",
       "\n",
       "/* On hover */\n",
       "#sk-container-id-1 a.estimator_doc_link:hover {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-3);\n",
       "  color: var(--sklearn-color-background);\n",
       "  text-decoration: none;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 a.estimator_doc_link.fitted:hover {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-3);\n",
       "}\n",
       "</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KNeighborsClassifier(n_neighbors=3)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;&nbsp;KNeighborsClassifier<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.5/modules/generated/sklearn.neighbors.KNeighborsClassifier.html\">?<span>Documentation for KNeighborsClassifier</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></label><div class=\"sk-toggleable__content fitted\"><pre>KNeighborsClassifier(n_neighbors=3)</pre></div> </div></div></div></div>"
      ],
      "text/plain": [
       "KNeighborsClassifier(n_neighbors=3)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 创建算法模型对象\n",
    "# n_neighbors=x:表示knn中的k\n",
    "knn = KNeighborsClassifier(n_neighbors=3)\n",
    "# 训练模型:特征数据必须是二维的\n",
    "knn.fit(x_train,y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "de034267-27ee-4ad0-ade6-99e1169265f9",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\21232\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\base.py:493: UserWarning: X does not have valid feature names, but KNeighborsClassifier was fitted with feature names\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array(['Action', 'Action'], dtype=object)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 测试模型\n",
    "# knn.predict([12,30]) # 报错必须是二维\n",
    "knn.predict([[12,30]])\n",
    "knn.predict(x_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "1e92e4c7-93aa-4dec-b264-51c21ab2c83b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "模型分类结果: ['Action' 'Action']\n",
      "真实的结果: 2    Action\n",
      "1    Action\n",
      "Name: target, dtype: object\n"
     ]
    }
   ],
   "source": [
    "print('模型分类结果:',knn.predict(x_test))\n",
    "print('真实的结果:',y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6f91a612-6c37-4d44-a6ae-9642eeb27fb0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 模型测试\n",
    "knn.score(x_test,y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64a9aa37-5d59-4fc2-b473-ad6c4c327215",
   "metadata": {},
   "source": [
    "**鸢尾花分类实现**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "36a842ae-deb9-4c2a-b731-d41aae55835c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split\n",
    "import sklearn.datasets as datasets\n",
    "from sklearn.neighbors import KNeighborsClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "070a1d38-6fb5-42db-abfd-b8fc68eb3db5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((150, 4), (150,))"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "iris = datasets.load_iris()\n",
    "feature = iris['data']\n",
    "target = iris['target']\n",
    "feature.shape,target.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "1ba50b98-0ab5-4915-9d0d-41de475506a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "x_train,x_test,y_train,y_test = train_test_split(feature,target,test_size=0.2,random_state=2020)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "3568fae2-cbe9-4f41-a2e8-39a12c27bf1b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9666666666666667"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "knn = KNeighborsClassifier(n_neighbors=8)\n",
    "knn.fit(x_train,y_train)\n",
    "knn.score(x_test,y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "8b575ed9-7c4d-4a8e-8cf2-0834c9cde9d8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "模型分类结果: [2 0 1 1 1 2 2 1 0 0 2 1 0 2 2 0 1 1 2 0 0 2 1 0 2 1 1 1 0 0]\n",
      "真实的分类结果 [2 0 1 1 1 2 2 1 0 0 2 2 0 2 2 0 1 1 2 0 0 2 1 0 2 1 1 1 0 0]\n"
     ]
    }
   ],
   "source": [
    "print('模型分类结果:',knn.predict(x_test))\n",
    "print('真实的分类结果',y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89856231-43a6-408c-bfdf-37879ddfa5fd",
   "metadata": {},
   "source": [
    "**预测年收入是否大于50k美元**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "70b4577c-5c95-4464-8805-662f64ac4b69",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>education_num</th>\n",
       "      <th>occupation</th>\n",
       "      <th>hours_per_week</th>\n",
       "      <th>salary</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>39</td>\n",
       "      <td>13</td>\n",
       "      <td>Adm-clerical</td>\n",
       "      <td>40</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>50</td>\n",
       "      <td>13</td>\n",
       "      <td>Exec-managerial</td>\n",
       "      <td>13</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>38</td>\n",
       "      <td>9</td>\n",
       "      <td>Handlers-cleaners</td>\n",
       "      <td>40</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>53</td>\n",
       "      <td>7</td>\n",
       "      <td>Handlers-cleaners</td>\n",
       "      <td>40</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28</td>\n",
       "      <td>13</td>\n",
       "      <td>Prof-specialty</td>\n",
       "      <td>40</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  education_num         occupation  hours_per_week salary\n",
       "0   39             13       Adm-clerical              40  <=50K\n",
       "1   50             13    Exec-managerial              13  <=50K\n",
       "2   38              9  Handlers-cleaners              40  <=50K\n",
       "3   53              7  Handlers-cleaners              40  <=50K\n",
       "4   28             13     Prof-specialty              40  <=50K"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./data/adults.txt')\n",
    "df = df[['age','education_num','occupation','hours_per_week','salary']]\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb54a185-3352-40c1-9f97-908d01e97e9c",
   "metadata": {},
   "source": [
    "- 实现特征值化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "7af4b1fc-31ac-4b3d-81ce-77566aaa90a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((32561, 4), (32561,))"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col_names = df.columns[df.columns != 'salary']\n",
    "feature = df[col_names]\n",
    "target = df['salary']\n",
    "feature.shape,target.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "272f70f8-8791-47e4-9f39-fb733346fc37",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Adm-clerical': 1,\n",
       " 'Exec-managerial': 2,\n",
       " 'Handlers-cleaners': 3,\n",
       " 'Prof-specialty': 4,\n",
       " 'Other-service': 5,\n",
       " 'Sales': 6,\n",
       " 'Craft-repair': 7,\n",
       " 'Transport-moving': 8,\n",
       " 'Farming-fishing': 9,\n",
       " 'Machine-op-inspct': 10,\n",
       " 'Tech-support': 11,\n",
       " '?': 12,\n",
       " 'Protective-serv': 13,\n",
       " 'Armed-Forces': 14,\n",
       " 'Priv-house-serv': 15}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 生成映射关系表\n",
    "occ_count_arr = feature['occupation'].unique()\n",
    "count = 1\n",
    "dic = {}\n",
    "for occ in occ_count_arr:\n",
    "    dic[occ] = count\n",
    "    count+=1\n",
    "dic"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "21ddaa3d-1fd0-408e-9772-6d4cf4a6744a",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\21232\\AppData\\Local\\Temp\\ipykernel_4512\\4109005137.py:1: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  feature['occupation'] = feature['occupation'].map(dic)\n"
     ]
    }
   ],
   "source": [
    "feature['occupation'] = feature['occupation'].map(dic)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "345b71d4-6523-4c15-a058-afed85417f8a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>education_num</th>\n",
       "      <th>occupation</th>\n",
       "      <th>hours_per_week</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>39</td>\n",
       "      <td>13</td>\n",
       "      <td>1</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>50</td>\n",
       "      <td>13</td>\n",
       "      <td>2</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>38</td>\n",
       "      <td>9</td>\n",
       "      <td>3</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>53</td>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28</td>\n",
       "      <td>13</td>\n",
       "      <td>4</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32556</th>\n",
       "      <td>27</td>\n",
       "      <td>12</td>\n",
       "      <td>11</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32557</th>\n",
       "      <td>40</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32558</th>\n",
       "      <td>58</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32559</th>\n",
       "      <td>22</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32560</th>\n",
       "      <td>52</td>\n",
       "      <td>9</td>\n",
       "      <td>2</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>32561 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       age  education_num  occupation  hours_per_week\n",
       "0       39             13           1              40\n",
       "1       50             13           2              13\n",
       "2       38              9           3              40\n",
       "3       53              7           3              40\n",
       "4       28             13           4              40\n",
       "...    ...            ...         ...             ...\n",
       "32556   27             12          11              38\n",
       "32557   40              9          10              40\n",
       "32558   58              9           1              40\n",
       "32559   22              9           1              20\n",
       "32560   52              9           2              40\n",
       "\n",
       "[32561 rows x 4 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "feature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "b269f4c2-303e-427f-b053-f86da3842dd9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 样本数据切分\n",
    "x_train,x_test,y_train,y_test = train_test_split(feature,target,test_size=0.1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "8fac4562-a298-468c-bfd5-9af8b3414a86",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7881486030089039"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "knn = KNeighborsClassifier(n_neighbors=20)\n",
    "knn.fit(x_train,y_train)\n",
    "knn.score(x_test,y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd39ddbe-9ba5-455a-886f-c86aa35481f7",
   "metadata": {},
   "source": [
    "- 使用onehot编码进行特征值化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "7a0f6145-4889-4546-91c8-9a229758ad1d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>education_num</th>\n",
       "      <th>occupation</th>\n",
       "      <th>hours_per_week</th>\n",
       "      <th>salary</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>39</td>\n",
       "      <td>13</td>\n",
       "      <td>Adm-clerical</td>\n",
       "      <td>40</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>50</td>\n",
       "      <td>13</td>\n",
       "      <td>Exec-managerial</td>\n",
       "      <td>13</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>38</td>\n",
       "      <td>9</td>\n",
       "      <td>Handlers-cleaners</td>\n",
       "      <td>40</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>53</td>\n",
       "      <td>7</td>\n",
       "      <td>Handlers-cleaners</td>\n",
       "      <td>40</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28</td>\n",
       "      <td>13</td>\n",
       "      <td>Prof-specialty</td>\n",
       "      <td>40</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  education_num         occupation  hours_per_week salary\n",
       "0   39             13       Adm-clerical              40  <=50K\n",
       "1   50             13    Exec-managerial              13  <=50K\n",
       "2   38              9  Handlers-cleaners              40  <=50K\n",
       "3   53              7  Handlers-cleaners              40  <=50K\n",
       "4   28             13     Prof-specialty              40  <=50K"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./data/adults.txt')\n",
    "df = df[['age','education_num','occupation','hours_per_week','salary']]\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "c2c3d37a-fe9e-4d3d-a827-ab74ee9cd172",
   "metadata": {},
   "outputs": [],
   "source": [
    "col_names = df.columns[df.columns != 'salary']\n",
    "feature = df[col_names]\n",
    "target = df['salary']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "5a741e9d-7902-40bc-a282-933f71fb095e",
   "metadata": {},
   "outputs": [],
   "source": [
    "n_df = pd.get_dummies(feature['occupation'],dtype=int)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "032d850b-90bb-4870-8e63-550c966ee68a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>education_num</th>\n",
       "      <th>hours_per_week</th>\n",
       "      <th>?</th>\n",
       "      <th>Adm-clerical</th>\n",
       "      <th>Armed-Forces</th>\n",
       "      <th>Craft-repair</th>\n",
       "      <th>Exec-managerial</th>\n",
       "      <th>Farming-fishing</th>\n",
       "      <th>Handlers-cleaners</th>\n",
       "      <th>Machine-op-inspct</th>\n",
       "      <th>Other-service</th>\n",
       "      <th>Priv-house-serv</th>\n",
       "      <th>Prof-specialty</th>\n",
       "      <th>Protective-serv</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Tech-support</th>\n",
       "      <th>Transport-moving</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>39</td>\n",
       "      <td>13</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>50</td>\n",
       "      <td>13</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>38</td>\n",
       "      <td>9</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>53</td>\n",
       "      <td>7</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28</td>\n",
       "      <td>13</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32556</th>\n",
       "      <td>27</td>\n",
       "      <td>12</td>\n",
       "      <td>38</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32557</th>\n",
       "      <td>40</td>\n",
       "      <td>9</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32558</th>\n",
       "      <td>58</td>\n",
       "      <td>9</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32559</th>\n",
       "      <td>22</td>\n",
       "      <td>9</td>\n",
       "      <td>20</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32560</th>\n",
       "      <td>52</td>\n",
       "      <td>9</td>\n",
       "      <td>40</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>32561 rows × 18 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       age  education_num  hours_per_week  ?  Adm-clerical  Armed-Forces  \\\n",
       "0       39             13              40  0             1             0   \n",
       "1       50             13              13  0             0             0   \n",
       "2       38              9              40  0             0             0   \n",
       "3       53              7              40  0             0             0   \n",
       "4       28             13              40  0             0             0   \n",
       "...    ...            ...             ... ..           ...           ...   \n",
       "32556   27             12              38  0             0             0   \n",
       "32557   40              9              40  0             0             0   \n",
       "32558   58              9              40  0             1             0   \n",
       "32559   22              9              20  0             1             0   \n",
       "32560   52              9              40  0             0             0   \n",
       "\n",
       "       Craft-repair  Exec-managerial  Farming-fishing  Handlers-cleaners  \\\n",
       "0                 0                0                0                  0   \n",
       "1                 0                1                0                  0   \n",
       "2                 0                0                0                  1   \n",
       "3                 0                0                0                  1   \n",
       "4                 0                0                0                  0   \n",
       "...             ...              ...              ...                ...   \n",
       "32556             0                0                0                  0   \n",
       "32557             0                0                0                  0   \n",
       "32558             0                0                0                  0   \n",
       "32559             0                0                0                  0   \n",
       "32560             0                1                0                  0   \n",
       "\n",
       "       Machine-op-inspct  Other-service  Priv-house-serv  Prof-specialty  \\\n",
       "0                      0              0                0               0   \n",
       "1                      0              0                0               0   \n",
       "2                      0              0                0               0   \n",
       "3                      0              0                0               0   \n",
       "4                      0              0                0               1   \n",
       "...                  ...            ...              ...             ...   \n",
       "32556                  0              0                0               0   \n",
       "32557                  1              0                0               0   \n",
       "32558                  0              0                0               0   \n",
       "32559                  0              0                0               0   \n",
       "32560                  0              0                0               0   \n",
       "\n",
       "       Protective-serv  Sales  Tech-support  Transport-moving  \n",
       "0                    0      0             0                 0  \n",
       "1                    0      0             0                 0  \n",
       "2                    0      0             0                 0  \n",
       "3                    0      0             0                 0  \n",
       "4                    0      0             0                 0  \n",
       "...                ...    ...           ...               ...  \n",
       "32556                0      0             1                 0  \n",
       "32557                0      0             0                 0  \n",
       "32558                0      0             0                 0  \n",
       "32559                0      0             0                 0  \n",
       "32560                0      0             0                 0  \n",
       "\n",
       "[32561 rows x 18 columns]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "n_feature = pd.concat((feature,n_df),axis=1).drop(labels='occupation',axis=1)\n",
    "n_feature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "0eda1296-d39b-4ff1-94ca-5aa5e6edc3ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 样本数据切分\n",
    "x_train,x_test,y_train,y_test = train_test_split(n_feature,target,test_size=0.1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "32546e28-21db-4ea8-9cb9-d9c3fd3a64ac",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7992017193736567"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "knn = KNeighborsClassifier(n_neighbors=20)\n",
    "knn.fit(x_train,y_train)\n",
    "knn.score(x_test,y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "effe5b22-44d1-436d-9ccc-40ba465c464e",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\21232\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\base.py:493: UserWarning: X does not have valid feature names, but KNeighborsClassifier was fitted with feature names\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array(['<=50K'], dtype=object)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 分类\n",
    "knn.predict(np.array(x_test.iloc[4]).reshape(1,-1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5260e5d-6579-4028-ad3c-5b09c4a1c979",
   "metadata": {},
   "source": [
    "# 交叉验证\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b0ffc34-1cd3-46cb-8d87-e3fc613496d1",
   "metadata": {},
   "source": [
    "- 实现思路\n",
    "  - 将训练数据平均分割成K等份\n",
    "  - 使用1份数据作为验证数据，其余作为训练数据\n",
    "  - 计算验证准确率\n",
    "  - 使用不同的测试集，重复2、3步骤\n",
    "  - 对准确率做平均，作为对未知数据预测准确的估计"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87e346e9-b23b-404d-ab9a-969f4e35de47",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
